Android上层WatchDog学习笔记 | 您所在的位置:网站首页 › android watch dog › Android上层WatchDog学习笔记 |
一、简述
1. 了解 WatchDog 的原理,可以更好的理解系统服务的运行机制。 二、WatchDog实现1. 代码实现位置 //frameworks/base/services/core/java/com/android/server/Watchdog.java public class Watchdog extends Thread { ... }可见 Watchdog 是一个线程。 2. WatchDog 在 SystemServer.java 中启动 run() //SystemServer.java startBootstrapServices() //SystemServer.java traceBeginAndSlog("StartWatchdog"); final Watchdog watchdog = Watchdog.getInstance(); watchdog.start(); traceEnd(); ... traceBeginAndSlog("InitWatchdog"); watchdog.init(mSystemContext, mActivityManagerService); traceEnd();可见 Watchdog 是运行在 SystemServer 中的一个辅线程。因为是线程,所以,只要start即可。 3. WatchDog构造方法 private Watchdog() { super("watchdog"); // not checking the background thread,shared foreground thread is the main checker. 线程名 "android.fg" mMonitorChecker = new HandlerChecker(FgThread.getHandler(), "foreground thread", DEFAULT_TIMEOUT); mHandlerCheckers.add(mMonitorChecker); // Add checker for main thread. only do a quick check since there can be UI running on the thread. mHandlerCheckers.add(new HandlerChecker(new Handler(Looper.getMainLooper()), "main thread", DEFAULT_TIMEOUT)); // Add checker for shared UI thread. 线程名 "android.ui" mHandlerCheckers.add(new HandlerChecker(UiThread.getHandler(), "ui thread", DEFAULT_TIMEOUT)); // And also check IO thread. 线程名 "android.io" mHandlerCheckers.add(new HandlerChecker(IoThread.getHandler(), "i/o thread", DEFAULT_TIMEOUT)); // And the display thread. 线程名 "android.display" mHandlerCheckers.add(new HandlerChecker(DisplayThread.getHandler(), "display thread", DEFAULT_TIMEOUT)); // And the animation thread. 线程名 "android.anim" mHandlerCheckers.add(new HandlerChecker(AnimationThread.getHandler(), "animation thread", DEFAULT_TIMEOUT)); // And the surface animation thread. 线程名 "android.anim.lf" mHandlerCheckers.add(new HandlerChecker(SurfaceAnimationThread.getHandler(), "surface animation thread", DEFAULT_TIMEOUT)); // Initialize monitor for Binder threads. addMonitor(new BinderThreadMonitor()); mOpenFdMonitor = OpenFdMonitor.create(); HandlerThread handlerThread = new HandlerThread("workThread"); //SS下的"workThread"线程 handlerThread.start(); mWorkHandler = new Handler(handlerThread.getLooper()) { @Override public void handleMessage(Message msg) { switch (msg.what) { case MESSAGE_AFE_CHECK_ERROR: checkAfeStatus(false); break; case MESSAGE_AFE_CHECK_OVER: Slog.i(TAG, "release observer"); mFileObserver.stopWatching(); mFileObserver = null; checkAfeStatus(true); getLooper().quitSafely(); mWorkHandler = null; break; } } }; // See the notes on DEFAULT_TIMEOUT. assert DB || DEFAULT_TIMEOUT > ZygoteConnectionConstants.WRAPPED_PID_TIMEOUT_MILLIS; }重点关注两个对象:mMonitorChecker 和 mHandlerCheckers。 其中 mHandlerCheckers 列表元素的来源: (1)构造对象的导入:UiThread、IoThread、DisplatyThread、FgThread加入 (2)外部导入:Watchdog.getInstance().addThread(handler); mMonitorChecker 列表元素的来源: (1) 外部导入:Watchdog.getInstance().addMonitor(monitor); (2) 特别说明:addMonitor(new BinderThreadMonitor()); 3. WatchDog的run()方法 public void run() { while (true) { ... synchronized (this) { for (int i=0; i 0)) { mCompleted = true; return; } if (!mCompleted) { // we already have a check in flight, so no need return; } mCompleted = false; mCurrentMonitor = null; mStartTime = SystemClock.uptimeMillis(); mHandler.postAtFrontOfQueue(this); }mMonitors.size() == 0 的情況,主要为了检查 mHandlerCheckers 中的元素是否超时,运用的手段:mHandler.getLooper().getQueue().isPolling(). mMonitorChecker 对象的列表元素一定是大于0,此时,关注点在 mHandler.postAtFrontOfQueue(this): 5. HandlerChecker 的 run() public final class HandlerChecker implements Runnable { ... @Override public void run() { final int size = mMonitors.size(); for (int i = 0 ; i < size ; i++) { synchronized (Watchdog.this) { mCurrentMonitor = mMonitors.get(i); } mCurrentMonitor.monitor(); } synchronized (Watchdog.this) { mCompleted = true; mCurrentMonitor = null; } } ... }运用的手段,监听 monitor 方法。 (1) 这里是对 mMonitors 进行 monitor,而能够满足条件的只有:mMonitorChecker,例如,各种服务通过 addMonitor 加入列表。 Watchdog.getInstance().addMonitor(this); //ActivityManagerService.java Watchdog.getInstance().addMonitor(this); //InputManagerService.java Watchdog.getInstance().addMonitor(this); //PowerManagerService.java Watchdog.getInstance().addMonitor(this); //WindowManagerService.java而被执行的 monitor 方法很简单,例如 ActivityManagerService 的: public void monitor() { synchronized (this) { } }这里仅仅是检查系统服务是否长时间被锁住。 (2) 特别说明,检查 BinderThreadMonitor 方法 private static final class BinderThreadMonitor implements Watchdog.Monitor { @Override public void monitor() { Binder.blockUntilThreadAvailable(); } } //frameworks/base/core/java/android/os/Binder.java public static final native void blockUntilThreadAvailable(); //frameworks/native/libs/binder/IPCThreadState.cpp void IPCThreadState::blockUntilThreadAvailable() { pthread_mutex_lock(&mProcess->mThreadCountLock); while (mProcess->mExecutingThreadsCount >= mProcess->mMaxThreads) { ALOGW("Waiting for thread to be free. mExecutingThreadsCount=%lu mMaxThreads=%lu\n", static_cast(mProcess->mExecutingThreadsCount), static_cast(mProcess->mMaxThreads)); pthread_cond_wait(&mProcess->mThreadCountDecrement, &mProcess->mThreadCountLock); } pthread_mutex_unlock(&mProcess->mThreadCountLock); }这里仅仅是检查进程中包含的可执行线程的数量不能超过 mMaxThreads,如果超过了最大值(31个),就需要等待。默认每个进程最大15个binder线程,但是SS将自己的改成31个了: //frameworks/native/libs/binder/ProcessState.cpp #define DEFAULT_MAX_BINDER_THREADS 15 //frameworks/base/services/java/com/android/server/SystemServer.java public final class SystemServer { private static final int sMaxBinderThreads = 31; private void run() { BinderInternal.setMaxThreads(sMaxBinderThreads); //在启动所有服务之前就设置了 ... startBootstrapServices(); ] }6. 超时后WatchDog会做什么 private void checkAfeStatus(boolean success) { public void run() { ... Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject); WatchdogDiagnostics.diagnoseCheckers(blockedCheckers); Slog.w(TAG, "*** GOODBYE!"); Process.killProcess(Process.myPid()); System.exit(10); }kill自己所在进程(system_server),并退出。 三、WatchDog日志打印1. process stack traces 保存路径由 dalvik.vm.stack-trace-file 或 dalvik.vm.stack-trace-dir 控制,常规为 /data/anr 。调用 ActivityManagerService.dumpStackTraces() 进行打印。 public final class HandlerChecker implements Runnable { //Watchdog.java public void run() { while (true) { if (!fdLimitTriggered) { if (waitState == WAITED_HALF) { if (!waitedHalf) { Slog.i(TAG, "WAITED_HALF"); // We've waited half the deadlock-detection interval. Pull a stack // trace and wait another half. ArrayList pids = new ArrayList(); pids.add(Process.myPid()); ActivityManagerService.dumpStackTraces(pids, null, null, getInterestingNativePids()); } } } final File stack = ActivityManagerService.dumpStackTraces(pids, null, null, getInterestingNativePids()); } } }注意,堵塞一半时即 WAITED_HALF,也会打印 process stack traces。 2. slog Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject); Slog.w(TAG, "*** GOODBYE!");3. event log EventLog.writeEvent(EventLogTags.WATCHDOG, subject);4. kernel stack traces // Trigger the kernel to dump all blocked threads, and backtraces on all CPUs to the kernel log doSysRq('w'); doSysRq('l');触发 show-backtrace-all-active-cpus(l) show-blocked-tasks(w) 这两个sysrq来获取active cpu和D状态线程的栈回溯,打印到内核log中。 5. dropbox Thread dropboxThread = new Thread("watchdogWriteToDropbox") { public void run() { // If a watched thread hangs before init() is called, we don't have a // valid mActivity. So we can't log the error to dropbox. if (mActivity != null) { mActivity.addErrorToDropBox("watchdog", null, "system_server", null, null, null, subject, null, stack, null); } StatsLog.write(StatsLog.SYSTEM_SERVER_WATCHDOG_OCCURRED, subject); } }; dropboxThread.start();注意,dropbox 一般放在 /data/system/dropbox 目录下,指定目录的位置是: //frameworks/base/services/core/java/com/android/server/DropBoxManagerService.java public DropBoxManagerService(final Context context) { this(context, new File("/data/system/dropbox"), FgThread.get().getLooper()); }四、监测UiThread、IoThread、DisplatyThread、FgThread的原因 1. 这4个类,继承 ServiceThread,是单例模式。例如 UiThread.java //frameworks/base/services/core/java/com/android/server/UiThread.java public final class UiThread extends ServiceThread { private UiThread() { super("android.ui", Process.THREAD_PRIORITY_FOREGROUND, false /*allowIo*/); } @Override public void run() { // Make sure UiThread is in the fg stune boost group Process.setThreadGroup(Process.myTid(), Process.THREAD_GROUP_TOP_APP); super.run(); } private static void ensureThreadLocked() { if (sInstance == null) { sInstance = new UiThread(); sInstance.start(); final Looper looper = sInstance.getLooper(); looper.setTraceTag(Trace.TRACE_TAG_SYSTEM_SERVER); looper.setSlowLogThresholdMs(SLOW_DISPATCH_THRESHOLD_MS, SLOW_DELIVERY_THRESHOLD_MS); sHandler = new Handler(sInstance.getLooper()); } } public static UiThread get() { synchronized (UiThread.class) { ensureThreadLocked(); return sInstance; } } public static Handler getHandler() { synchronized (UiThread.class) { ensureThreadLocked(); return sHandler; } } }(1) 通过 get() 获取对象。 (2) 通过 getHandler() 获取各自线程里面的 Handler 对象。 (3) 注意看,创建自身对象 ensureThreadLocked 的时候,就进行了 start 动作。也就是说,这个线程。在创建对象的时候就,就已经启动了。 其次,这四个类都继承 ServiceThread ,而 ServiceThread 继承 HandlerThread。我们重点关注线程中的 Handler,因为 AMS、WMS、PMS 等系统服务都涉及调用它们。 //frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java final class UiHandler extends Handler { public UiHandler() { super(com.android.server.UiThread.get().getLooper(), null, true); } @Override public void handleMessage(Message msg) { switch (msg.what) { case SHOW_ERROR_UI_MSG: case SHOW_NOT_RESPONDING_UI_MSG: case SHOW_STRICT_MODE_VIOLATION_UI_MSG: case WAIT_FOR_DEBUGGER_UI_MSG: case DISPATCH_PROCESSES_CHANGED_UI_MSG: case DISPATCH_PROCESS_DIED_UI_MSG: case DISPATCH_UIDS_CHANGED_UI_MSG: case DISPATCH_OOM_ADJ_OBSERVER_MSG: } } }UiHandler 是直接获取的 UiThread 里面的 Looper。我们清楚一个线程一个 Looper,一个 MessageQueue,但是可以有多个 Handler. 我们看 handleMessage 里面的处理方式,说明并不一定是主线程才能更新Ui。(但是Android有说明必须主线程才能更新UI)。 2. 使用的场景差异 UiThread --> ActivityManagerService DisplayThread --> WindowManagerService、InputManagerService、DisplayMangerService IoThread --> PackageInstallerService、StorageManagerService、BluetoothManagerService五、总结 1. Watchdog 的核心对象为 mHandlerCheckers 和 mMonitorChecker。 mHandlerCheckers:监控消息队列是否发生阻塞。 mMonitorChecker:监控系统核心服务是否发生长时间持锁。 mHandlerCheckers 的对象采用手段为通过 mHandler.getLooper().getQueue().isPolling() 判断是否超时;mMonitorChecker 通过 synchronized(this) 判断是否超时,其中特别注意,BinderThreadMonitor 主要是通过判断Binder线程是否超过了系统最大值来判断是否超时。 2. 超时之后,系统会打印一系列的日志,可以根据各种日志输出,进行有效分析。 3. 超时之后,Watchdog会杀掉自己的进程,也就是此时 system_server 进程的pid会变化。
参考:android原理分析博客,Android WatchDog原理分析:https://blog.csdn.net/weixin_28543661/article/details/117344345
|
CopyRight 2018-2019 实验室设备网 版权所有 |